Gesture Recognition

In this group project, you are going to build a 3D Conv model that will be able to predict the 5 gestures correctly.

                                                                                           - Trisit Kumar Chatterjee
                                                                                           - Subhasis Jethy

Problem Statement

Imagine you are working as a data scientist at a home electronics company which manufactures state of the art smart televisions. You want to develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote.

The gestures are continuously monitored by the webcam mounted on the TV. Each gesture corresponds to a specific command

We set the random seed so that the results don't vary drastically.

In this block, you read the folder names for training and validation. You also set the batch_size here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

EDA - Analysis of the Images to see the size and augmentation requirements

Generator

This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, you are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with img_idx, y,z and normalization such that you get high accuracy.

Note here that a video is represented above in the generator as (number of images, height, width, number of channels). Take this into consideration while creating the model architecture.

Test the Generator Function

Model

Here you make the model using different functionalities that Keras provides. Remember to use Conv3D and MaxPooling3D and not Conv2D and Maxpooling2D for a 3D convolution model. You would want to use TimeDistributed while building a Conv2D + RNN model. Also remember that the last layer is the softmax. Design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

Custom functions

Let us write some custom functions first which need to be repeatedly called

  1. generator() function which is already written above
  2. trainer() function to fit and train the model and calculate its accuracy and loss
  3. modelplot() to plot the model accuracy and losses

Sample Model to Test

Deciding the batch size, image size and number of frames

Time taken and the memory needed to train the model is greatly affected by the batch size, image size and number of frames. We do as below:

Choosing Batch Size

Choose number of frames

As these parameters are decided let us start building the model and decide on other parameters such as filter size, layers etc

Model Architecture 1 - Conv3D model with filter size (3,3,3)

Model Architecture 2 - Conv3D model with filter size (2,2,2)

Model Architecture 3 - Conv3D model with filter size (3,3,3) and Augmentation

Results of above model:

Results of above model:

Results of above model:

Model Architecture 4 - Conv3D model with filter size (3,3,3) and 4 Conv3D layer and two Dense layer

Model Architecture 5 - Conv3D model with filter size (3,3,3) and Leaky ReLU

Model Architecture 6 - CNN LSTM Model

Model Architecture 7 - CNN+GRU Model

Model Architecture 8 - Transfer Learning with MobileNet + GRU Model

Model Architecture 9 - VGG16 LSTM Model

Winner #1 is Transfer Learning Model - MobileNet + GRU Model(RNN)

Load the winner model and Test a sample

Winner #2 is Conv3D

Load the winner #2 model and Test a sample

Winner #3 is CNN+GRU Model

Load the winner #3 model and Test a sample